University of Texas at Austin

Past Event: Oden Institute Seminar

Reinforcement Learning with Partial Observations - TO BE RESCHEDULED

Michael Zavlanos, Professor, Department of Mechanical Engineering and Materials Science, Duke University

11 – 1PM
Tuesday Oct 25, 2022

POB 6.304

Abstract

Reinforcement learning (RL) has been widely used to solve sequential decision making problems in unknown stochastic environments. In this talk we present a new derivative-free (zeroth-order) policy optimization method for Multi-Agent Reinforcement Learning (MARL) with partial state and action observations and for online learning in non-stationary environments. Zeroth-order optimization methods enable the optimization of black-box models that are available only in the form of input-output data and are common in training of Deep Neural Networks and RL. In the absence of input-output models, exact first or second order information (gradient or hessian) is unavailable and can not be used for optimization. Therefore, zeroth-order methods rely on input-output data to obtain approximations of the gradients that can be used as descent directions. In this talk, we present a new one-point policy gradient estimator that we have recently developed that requires a single function evaluation at each iteration to estimate the gradient, by using the residual between two consecutive feedback points. We refer to this scheme as residual feedback. We show that residual feedback can be used to develop Multi-Agent Reinforcement Learning (MARL) methods with partial state and action observations, as it allows the agents to compute the local policy gradients needed to update their local policy functions using local estimates of the global accumulated rewards. We also analyze the performance of the proposed residual feedback estimator for online learning, where one-point policy gradient estimation is the only viable choice. We show that, in both MARL and online learning, residual feedback induces a smaller estimation variance than other one-point feedback methods and, therefore, improves the learning rate.

Biography

Michael M. Zavlanos is the Yoh Family Professor of the Department of Mechanical Engineering and Materials Science at Duke University, Durham, NC. He also holds a secondary appointment in the Department of Electrical and Computer Engineering and the Department of Computer Science. Currently, he is also an Amazon Scholar with Amazon Robotics, North Reading, MA. His research focuses on control theory, optimization, learning, and AI with applications in robotics and autonomous systems, cyber-physical systems, and healthcare/medicine.

He is a recipient of various awards, including the 2014 Office of Naval Research Young Investigator Program (YIP) Award and the 2011 National Science Foundation Faculty Early Career Development (CAREER) Award.

Michael M. Zavlanos received the Diploma in mechanical engineering from the National Technical University of Athens (NTUA), Athens, Greece, in 2002, and the M.S.E. and Ph.D. degrees in electrical and systems engineering from the University of Pennsylvania, Philadelphia, PA, in 2005 and 2008, respectively.

Reinforcement Learning with Partial Observations - TO BE RESCHEDULED

Event information

Date
11 – 1PM
Tuesday Oct 25, 2022
Location POB 6.304
Hosted by Ufuk Topcu